# 16kHz audio adaptation

Viwav2vec2 Base 3k
This model is a Wav2Vec2 base model pre-trained on 3,000 hours of Vietnamese speech data, suitable for Vietnamese speech recognition tasks, and requires fine-tuning on downstream tasks for use.
Speech Recognition Transformers Other
V
dragonSwing
41
2
Data2vec Audio Large 100h
Apache-2.0
Data2Vec is a general self-supervised learning framework applicable to speech, natural language processing, and computer vision tasks. This model is a large-scale model pre-trained and fine-tuned on 100 hours of Librispeech audio data.
Speech Recognition Transformers English
D
facebook
46
2
Wav2vec2 Xlsr Multilingual 53 Fa
A multilingual speech recognition model based on the wav2vec 2.0 architecture, specifically fine-tuned for Persian, significantly reducing word error rate
Speech Recognition Transformers
W
masoudmzb
83
7
Wav2vec2 Large Xlrs Estonian
Apache-2.0
This is an automatic speech recognition (ASR) model fine-tuned on the Estonian Common Voice dataset, based on the facebook/wav2vec2-large-xlsr-53 model.
Speech Recognition Other
W
birgermoell
18
0
Wav2vec2 Base Hr Voxpopuli V2
Speech model based on Facebook's Wav2Vec2 architecture, pre-trained on the Croatian VoxPopuli corpus
Speech Recognition Transformers Other
W
facebook
30
1
Wav2vec2 Large Xlsr 53 Breton
Apache-2.0
A Breton fine-tuned speech recognition model based on facebook/wav2vec2-large-xlsr-53
Speech Recognition Other
W
mrm8488
26
0
Wav2vec2 Large Xlsr 53 Hungarian
Apache-2.0
This is a Hungarian automatic speech recognition model fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, trained using the Common Voice dataset.
Speech Recognition Other
W
anton-l
17
0
W2v Hf Commonvoice From Xlsr53 Pretrain 0329UTC1500
A speech recognition model fine-tuned on the Common Voice Japanese dataset based on facebook/wav2vec2-large-xlsr-53
Speech Recognition Transformers
W
qqpann
15
0
Wav2vec2 Large 960h Lv60
Apache-2.0
Wav2Vec2 is a powerful speech recognition model that extracts features from raw audio through self-supervised learning and achieves high-performance speech recognition with limited labeled data.
Speech Recognition English
W
facebook
7,011
6
Wav2vec2 Large Xlsr Georgian
Apache-2.0
Georgian automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input
Speech Recognition Transformers Other
W
xsway
14.80k
1
Wav2vec2 Large Xlsr 53 Chuvash
Apache-2.0
A Chuvash automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on the Common Voice dataset with a word error rate of 40.01%.
Speech Recognition Other
W
anton-l
30
0
Wav2vec2 Large Xlsr 53 German
Apache-2.0
This is a fine-tuned XLSR-53 large model for German speech recognition tasks, based on Facebook's wav2vec2-large-xlsr-53 model and fine-tuned on the Common Voice 6.1 German dataset.
Speech Recognition German
W
jonatasgrosman
8,266
7
Wav2vec2 Base Vn 270h
A speech recognition model fine-tuned with approximately 270 hours of Vietnamese annotated data, supporting Vietnamese automatic speech recognition tasks
Speech Recognition Other
W
dragonSwing
202
8
Wav2vec2 Large Superb Ks
Apache-2.0
A speech classification model fine-tuned on the SUPERB keyword spotting task, based on the Wav2Vec2-Large-LV60 pre-trained model
Speech Recognition Transformers English
W
superb
18
1
Wav2vec2 Large Xlsr 53 Estonian
Apache-2.0
An automatic speech recognition model fine-tuned for Estonian using the Common Voice dataset, based on facebook/wav2vec2-large-xlsr-53
Speech Recognition Transformers Other
W
vasilis
26
0
Wav2vec2 Base Da Voxpopuli V2
A speech model based on Facebook's Wav2Vec2 architecture, specifically pre-trained for Danish using 13.6k unlabeled data from the VoxPopuli corpus.
Speech Recognition Transformers Other
W
facebook
35
0
Wav2vec2 Large Xlsr 53 Estonian
Apache-2.0
Estonian speech recognition model fine-tuned from Facebook's XLSR-53 large model, achieving 30.74% word error rate on Common Voice dataset
Speech Recognition Other
W
anton-l
3,259
0
Wav2vec2 Xlsr 53 Tamil
Apache-2.0
A Tamil speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on the Common Voice Tamil dataset.
Speech Recognition Other
W
anuragshas
64
0
Wav2vec2 Large Xlsr 53 Spanish
Apache-2.0
A Spanish speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on the Common Voice 6.1 Spanish dataset
Speech Recognition Spanish
W
jonatasgrosman
46.28k
30
Wav2vec2 Large West Germanic Voxpopuli V2
Facebook's Wav2Vec2 large model, pretrained exclusively on 66.3 hours of unlabeled data from the West Germanic VoxPopuli corpus.
Speech Recognition Transformers
W
facebook
25
1
Wav2vec2 Large El Voxpopuli V2
Greek speech recognition model pretrained on VoxPopuli corpus using 17.7 hours of unlabeled data
Speech Recognition Transformers Other
W
facebook
24
0
Sew D Tiny 100k
Apache-2.0
SEW-D is a compressed and efficient speech pre-training model developed by ASAPP Research, pre-trained on 16kHz sampled speech audio, suitable for various downstream speech tasks.
Speech Recognition Transformers English
S
asapp
1,074
2
Wav2vec2 Large Xlsr 53 Mongolian
Apache-2.0
An automatic speech recognition model fine-tuned on the Common Voice Mongolian dataset based on facebook/wav2vec2-large-xlsr-53
Speech Recognition Transformers Other
W
tugstugi
251
0
Wav2vec2 Large Fr Voxpopuli French
Apache-2.0
A French speech recognition model fine-tuned from facebook/wav2vec2-large-fr-voxpopuli, trained on the Common Voice 6.1 French dataset, supporting 16kHz audio input
Speech Recognition French
W
jonatasgrosman
51
3
Wav2vec2 Large Xlsr 53 Sakha
Apache-2.0
Yakut speech recognition model fine-tuned from XLSR-53 large model, with 32.23% word error rate
Speech Recognition Other
W
anton-l
25
0
Wav2vec2 Large Xlsr 53 Vietnamese
Apache-2.0
A Vietnamese automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.
Speech Recognition Transformers Other
W
not-tanh
22
4
Wav2vec2 Large Xlsr Vietnamese
Apache-2.0
Vietnamese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53
Speech Recognition Other
W
Nhut
22
0
Wav2vec2 Large Xlsr 53 Lithuanian
Apache-2.0
A Lithuanian speech recognition model fine-tuned from Facebook's XLSR-53 large model, trained on the Common Voice dataset with a test WER of 56.55%.
Speech Recognition Other
W
DeividasM
4,105
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase